Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences

نویسندگان

  • Chao Che
  • Wenwen Guo
  • Jianxin Zhang
چکیده

The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in historical classics. The method selects the sentence pairs with the same phrases at the beginning or the end of the sentence or with the same time phrases as anchor sentence pairs, which are employed to divide the paragraph into several sections. Then, the sentences in each section are aligned using dynamic programming algorithm according to the entropy calculated by maximum entropy model. The maximum entropy model employs improved Chinese co-occurrence character feature, length feature and sentence alignment mode feature. The Chinese cooccurrence characters feature is improved by giving different weights to characters in different position based on the contribution to align sentences. In the experiment performed on ShiJi, the precision and recall of the proposed method reaches 95.9% and 95.6% respectively, which outperforms other sentence alignment methods significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences

This paper, which builds on previous studies on sentence alignment, introduces a sentence alignment method in which some sentences are used as “anchors” and a two step procedure is applied. In the first step, some lexical information such as proper names, technical terms, numbers and punctuation marks, location information and length information are used to generate anchor sentences that satisf...

متن کامل

Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches

Sentence compression is a task of creating a short grammatical sentence by removing extraneous words or phrases from an original sentence while preserving its meaning. Existing methods learn statistics on trimming context-free grammar (CFG) rules. However, these methods sometimes eliminate the original meaning by incorrectly removing important parts of sentences, because trimming probabilities ...

متن کامل

Comparison of Alignment Templates and Maximum Entropy Models for NLP

ich warde gerne von KOln nach MUnchen fahren In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language...

متن کامل

Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy

This paper proposes a new approach to segmentation of utterances into sentences using a new linguistic model based upon Maximum-entropy-weighted Bidirectional N-grams. The usual N-gram algorithm searches for sentence boundaries in a text from left to right only. Thus a candidate sentence boundary in the text is evaluated mainly with respect to its left context, without fully considering its rig...

متن کامل

Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm

We present an e cient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016